The audio-video australian English speech data corpus AVOZES
نویسندگان
چکیده
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first published AV speaking-face data corpus for Australian English and is novel in its use of a stereo camera system for the video recordings and its modular design.
منابع مشابه
A Detailed Description of the AVOZES Data Corpus
The AVOZES data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the...
متن کاملA Stereo Vision Lip Tracking Algorithm and Subsequent Statistical Analyses of the Audio-Video Correlation in Australian English
Human perception of the world is inherently multi-sensory because the information provided is multimodal. The perception of spoken language is no exception. Beside the auditory information, there is visual speech information as well, provided by the facial movements as a result of moving the articulators during speech production. Visual speech information contributes to speech perception in all...
متن کاملStatistical analysis of the relationship between audio and video speech parameters for Australian English
After decades of research, automatic speech processing has become more and more viable in recent years. Audio-video speech recognition has been shown to improve the recognition rate in noise-degraded environments. However, which audio and video speech parameters to choose for an optimal system and how they are related is still an open research issue. Here we present a number of statistical anal...
متن کاملStereo Vision Lip-tracking for Audio-video Speech Processing
We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus. This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of m...
متن کاملAnalysis of Audio-video Correlati Australian English
This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is follow...
متن کامل